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Abstract 

Multi-manifold modeling is increasingly used in segmen- 
tation and data representation tasks in computer vision and 
related fields. While the general problem, modeling data 
by mixtures of manifolds, is very challenging, several ap- 
proaches exist for modeling data by mixtures of affine sub- 
spaces (which is often referred to as hybrid linear mod- 
eling). We translate some important instances of multi- 
manifold modeling to hybrid linear modeling in embedded 
spaces, without explicitly performing the embedding but ap- 
plying the kernel trick. The resulting algorithm, Kernel 
Spectral Curvature Clustering, uses kernels at two levels 
- both as an implicit embedding method to linearize nonflat 
manifolds and as a principled method to convert a multi- 
way affinity problem into a spectral clustering one. We 
demonstrate the effectiveness of the method by comparing 
it with other state-of-the-art methods on both synthetic data 
and a real-world problem of segmenting multiple motions 
from two perspective camera views. 

Supp. webpage: http://www.math.umn.edu/Merman/kscc/ 

1. Introduction 

Recently a lot of attention has been focused on multi- 
manifold modeling [1, 24, 20, 13, 25, 16, 15, 7, 9, 4, 5, 8, 2]. 
In a typical setting data is sampled from a mixture of distri- 
butions approximated by manifolds (e.g., quadratic surfaces 
in two-view geometries [18]), and the task is to segment 
the data into different clusters representing the manifolds. 
This is a common yet challenging problem in many applica- 
tions such as computer vision, face recognition, and image 
processing. A well-known example is the clustering of the 
MNIST handwritten digits [14], where all the images of a 
given digit live on a distinct manifold. 

*This work was supported by NSF grants #0612608 and #0915064. 



Due to the nature of manifolds, most algorithms analyze 
the local geometry of sampled data, such as density, dimen- 
sion and orientation, and then piece together those local 
similarities to find the correct clusters [20, 13, 7, 9, 8, 2]. 
For example, Goldberg et al. [8] estimate local Gaussian 
models around each data point and apply spectral cluster- 
ing [17] according to the Hellinger distances between those 
local models. A different local approach is used by K- 
Manifolds [20], which iterative ly clusters data into mani- 
folds via expectation-maximization, i.e., first approximat- 
ing each cluster by a manifold using a node-weighted mul- 
tidimensional scaling (while using local neighbors to esti- 
mate geodesic distances), and next assigning data points to 
the closest manifold from the former stage. These methods 
are sensitive to a number of factors, such as size of local 
neighborhoods and density of sampled data, and thus are 
expected to perform poorly when data is sparsely sampled 
(see e.g., Figs. 2 and 3). 

When only fiats, i.e., affine subspaces, are used to model 
the clusters, the corresponding problem, referred to as hy- 
brid linear modeling, is much easier to deal with because 
there are elegant representations for flats that can be utilized 
for solving the problem. For example, Generalized Princi- 
pal Component Analysis (GPCA) [24, 16] uses polynomi- 
als to represent linear subspaces, Local Subspace Affinity 
(LSA) [25] computes an affinity for any pair of points using 
the distance between their local tangent subspaces and then 
applies spectral clustering [17], Agglomerative Lossy Com- 
pression (ALC) [15] measures the number of bits needed to 
code the data by general flats (up to a pre-specified distor- 
tion), and Spectral Curvature Clustering (SCC) [5, 4] com- 
putes a flatness measure for each fixed-size subset of the 
data. Finally, there are algorithms that use the linear struc- 
ture and iterate between a data clustering step and a sub- 
space estimation step, e.g., if -Flats [12, 1 1, 3, 23] and Mix- 
tures of Probabilistic PCA (MoPPCA) [21]. 

In this work we focus our attention on multi-manifold 



1 



Figure 1. Two circles in K 2 and their images under the map $ 
(defined in Eq. (l))inR 3 . 

modeling with parametric surfaces. Our simple but effec- 
tive idea is to convert the problem into hybrid linear mod- 
eling by embedding the underlying (parametric) surfaces 
into a higher dimensional space where they become flats. 
For example, when the data is sampled from a union of 
(D — 1) -dimensional hyperspheres in the Euclidean space 
R D , the following function maps them to D-dimensional 
flats in 

*(*) = („»). Vx€R". (1) 

Fig. 1 illustrates this example for D = 2. When dealing 
with parametric surfaces, it is possible to apply hybrid lin- 
ear modeling algorithms (e.g., [3, 11, 24, 25, 15, 5]) in the 
embedded space to segment the original manifolds. 

If a hybrid linear modeling algorithm can be expressed 
only in terms of the dot products between the data points 
(e.g., .fiT-Flats [12, 11, 3, 23], MoPPCA [21], LSA [25], 
ALC [15] and SCC [5, 4]), then the explicit embedding can 
be avoided by using the kernel trick. A kernel is a real- 
valued function, fc(x,y), of two variables x,y 6 R D such 
that for any N points xx , . . . , xn in M. D , the kernel matrix 

K := {fc(x i ,Xj)}i< ji j<jv (2) 

is symmetric positive semi definite. It is shown in [19] that 
any kernel function can be represented as a dot product 

fc(x,y) = <$(x),$(y)>, VxjeK 15 , (3) 

where $ : M. D — > T and T is a Hilbert space. The map <f> is 
referred to as a. feature map and the space T ^feature space. 
Since we know the desired embedding <f>, we can form the 
appropriate kernel k by Eq. (3) and replace dot products 
with k in applicable hybrid linear modeling algorithms. 

In this paper we concentrate on the kernelization of the 
SCC algorithm [4, 5], which we refer to as Kernel Spectral 
Curvature Clustering (KSCC). The main reason for choos- 
ing SCC is that the current implementations of other hybrid 
linear modeling algorithms that are appropriate for kernel- 
ization [21,3, 1 1, 25, 15] do not perform sufficiently well on 
affine subspaces (unlike linear subspaces); see e.g., Fig. 4 
and [5, Table 2]. Another important reason is that SCC has 



established theoretical guarantees [4] and careful numeri- 
cal estimates [5] which can be used to justify successes and 
failures of KSCC. 

The rest of this paper is organized as follows. We first 
present the KSCC algorithm in Section 2. Experiments are 
then conducted in Section 3 to test the algorithm on both 
artificial data and a real-world problem of two-view mo- 
tion segmentation (The Matlab codes and relevant data can 
be found at the supplemental webpage). Finally, Section 4 
concludes with a brief discussion and open directions for 
future work. 

2. The KSCC algorithm 

Briefly speaking, the KSCC algorithm is the SCC algo- 
rithm [4, 5] performed in some user-specified feature space. 
However, all the relevant calculations in the feature space 
are accomplished in the original space via the correspond- 
ing kernel function. Thus, computations in the possibly 
high dimensions are avoided so as to save time. 

We assume a data set X = {xi , . . . ,Xjy} sampled from 
a collection of K manifolds in R D (possibly corrupted with 
noise and outliers). We will represent the data by a mixture 
of K parametric surfaces of the same model (e.g., general 
conic sections in W 2 ). Based on the given model of para- 
metric surfaces, we form a feature map $ : R D -> R L such 
that the images of the K parametric surfaces are flats (see 
examples in Section 3). Let £ be the maximal dimension 
of the flats. We remark that £ can be determined by sub- 
tracting 1 from the maximal number of affinely independent 
coordinates of $ (though future work will explore substan- 
tial reduction of £, whenever possible, via a feature selec- 
tion procedure). We then segment the original manifolds by 
clustering i-flats, i.e., ^-dimensional flats, in M. L . In prac- 
tice, we use a kernel matrix K which is implicitly formed 
by the hidden embedding <f> according to Eqs. (2) and (3). 

The KSCC algorithm starts by computing a polar cur- 
vature [5] for any £ + 2 points in the feature space via the 
kernel trick. Roughly speaking, the polar curvature is an 
^-dimensional flatness measure (in particular, it is zero for 
£ + 2 points lying on an ^-flat). More formally, it is the I2 
average of the polar sines at all vertices of the correspond- 
ing (£ + 1) -simplex in the feature space, multiplied by the 
diameter of that simplex. 

For any set of £ + 2 points in the original space with 
indices I = ■ ■ ■ , ie+2}, we denote the corresponding 
block of the kernel matrix K by Ktj (and similarly later 
wherever applicable), that is, 

Km := (K.,.,.,.. ,. (4) 

The KSCC algorithm computes the polar curvature of their 
corresponding feature vectors in the following way (see 



supplementary material for derivation of this formula): 

Cp(I) = ^ ' {Ku + Kjj ~ 2Klj) 

y, det(K u + 1) (5) 

ieI Yljeij^i K-a + — 2Kjj 

If any denominator above is zero then the algorithm assigns 
the value to the polar curvature (this happens only when 
two points with indices in I coincide in the feature space). 

The KSCC algorithm then assigns to any distinct 1 + 2 
points (with index set I) the following affinity: 

A v {l):=e-^/^\ (6) 

where a > is a tuning parameter, and zero otherwise. 
This function is expected to assign large values (toward 1) 
for points sampled from the same parametric surface and 
small values (toward 0) for points sampled from different 
surfaces. Its computation is solely based on K without in- 
voking directly the mapping $. 

The KSCC algorithm next forms pairwise weights W 
from the above multi-way affinities: 

W l3 = ^i p (M)'i P (j,J), (7) 
J 

where the sum is over 

{J = (ti, . . . , i e +i) | 1 < ii,..., ii+i < N}. (8) 

Finally, it applies spectral clustering [17] (with W) to find 
the clusters. 

We have thus far described the main steps of the KSCC 
algorithm in theory. However, due to its polynomial com- 
plexity (TV +2 ), the practical implementation of the algo- 
rithm will rely on the numerical strategies developed in [5], 
in particular, the iterative sampling procedure for estimating 
the matrix W. This procedure is fundamental to the practi- 
cal implementation and in fact makes the KSCC a random 
algorithm, unlike its brief description above. We thus pro- 
vide its details below. 

The zeroth iteration of the iterative sampling starts with 
a random sample of c <C N l+1 (£+ 1) -tuples of points from 
the data X, with index sets Ji, . . . , J c . It then uses them to 
estimate the weights W of Eq. (7) as follows: 

c 

W^«^.4 p (;,J r )-.4 p (j,J r ). (9) 

r=l 

Based on the above weights, K initial clusters are obtained 
by spectral clustering [17]. The first iteration then resam- 
ples c/K (£ + l)-tuples of points from each of the K pre- 
viously found clusters to get a better estimate of W so that 
K newer (and supposedly better) clusters are found. This 



procedure is repeated until convergence in order to obtain 
the best segmentation. We remark that the initial sampling 
might be critical to good final segmentation and, as £ be- 
comes larger, it is increasingly difficult to sample enough 
"useful" {£ + l)-tuples of points at the initial step. 

The convergence of iterative sampling is measured by 
the total kernel least squares error, ej| LS , which sums the 
least squares errors of £-flats approximation to the clusters 
Ci, . . . , Ck in the feature space and can be computed as 
follows (see supplementary material for proof): 

K 

4LS = EE A ^C fc ,C fc ), (10) 

k=i j>e 

where Xj(-) denotes the j-th largest eigenvalue of the ma- 
trix, andKc fc ,c fc is a centered version of Kc fc ,c fe (the block 
of the kernel matrix K corresponding to Cfe): 

K Cfc ,c fc := K Cfc , Cfc - l|c fc | ' K Cfc ,c fc - K Cfc , Cfe ■ l|c„| 
+ l|c fc | ' K c fc ,c fc ' l|C fc |) (11) 

in which |Cfe| denotes the number of points in Cfc, and 1„ 
is the n x n constant matrix with elements 1/n. 

In [5] other numerical strategies, such as an automatic 
scheme of tuning the parameter a, are also developed to 
speed up the SCC algorithm. We employ the same strate- 
gies to boost the performance of the KSCC algorithm and 
describe the main steps of the resulting algorithm in Algo- 
rithm 1. 

The KSCC algorithm employs kernels at two levels. 
First, it implicitly maps each data Xj to a feature vector fj 
and uses only the kernel matrix to compute the polar cur- 
vatures in the feature space. Second, the weight matrix W 
(see Eq. (7)) can also be interpreted as a kernel that com- 
putes dot products in the space R^' 41 . Indeed, the fea- 
ture point ii is further mapped to the i-th slice of the tensor 
A p : {A p (i,ii,...,ie+i),l < h,...,h+i < N}, which 
contains the interactions between the point and all £-flats 
spanned by any £+1 points in the feature space. 

2.1. Complexity of the KSCC algorithm 

The storage requirement of the KSCC algorithm is 
0(N ■ {D+ c) ) . The running time is O (n 8 • {£+ 1 ) 2 • D ■ N ■ c) , 
where n s is the number of sampling iterations performed. 

We briefly explain how to efficiently compute all the 
N — £ — 1 polar curvatures for a fixed (£+ 1) -tuple of points 
(with index set J r ) in Step 2 of Algorithm 1 . The complex- 
ity of computing det(K[ ; j r ] [i j r ] +1) for any point Xj in the 
rest of the data is 0((£ + 2) 3 ), which would translate into a 
total cost of 0(N- (£ + 2) 3 ) for all the N - £ - 1 curvatures. 
However, in any (£ + 2)-tuple, £ + 1 of the points are the 
same. Therefore, we can pre-compute all possible determi- 
nants of the form H jk = det(Kj r _{ j>fc } iJr _ {jifc } + 1) in 




0((£ + l) 3 ) time using the fact that H = adj(Kj rJi . + 1). 
Then, each determinant det(Kp jib j j + 1) can be com- 
puted in 0((£ + l) 2 ) time using its cofactor expansion and 
the pre-computed minors stored in H, for a total cost of 
0(N ■ (£+l) 2 ), since f < iV. 

3. Numerical experiments 
3.1. Artificial data 

To test the KSCC algorithm we have applied it to several 
artificial data sets shown in Fig. 2. 

In Figs. 2(a)-2(d) the data points lie on circles/spheres 
and possibly also on lines/planes. We apply the KSCC al- 
gorithm with the spherical kernel 

fc s (x,y)=x'.y+||x||^||y||", (12) 

which directly follows from Eqs. (1) and (3). We note that 
£ = D (clearly, the D + 1 coordinates of the mapping <1> in 
Eq. (1), i.e., xj., . . . ,X£>, ||x|| 2 , are affinely independent). 

In Fig. 2(e) the data consists of a circle, an ellipse, a 
parabola, and a hyperbola. It is natural to use the full 
quadratic polynomial kernel 

fc 2f (x,y) = (l+x'-y) 2 . (13) 

This is equivalent to embedding data by the feature map 

®(xx,x 2 ) = (l,V2x 1 ,V2x 2 ,xl,xl,V2x 1 x 2 ). (14) 



Therefore, the images of the 1-D conies are 4-flats in M 6 
(the first coordinate of <f> is constant, so that only the last five 
coordinates of $ are affinely independent). The KSCC al- 
gorithm successfully separates the different conic sections. 

Fig. 2(f) shows five Lissajous curves in the unit square. 
A Lissajous curve is the graph of the system of parametric 
equations 

x = Asm(at + 5), y = Bsm(bt). (15) 

We have required that | = 2 in Fig. 2(f). In this case, the 
kernel function can be constructed as follows (see supple- 
mentary material for proof): 

k((x 1 ,y 1 ),(x 2 ,y 2 )) 

- (1 + T x { Xl ) ■ Tifa) + T 2 { Vl ) ■ T 2 (y 2 ))\ (16) 

where T n is the Chebyshev polynomial of degree n. The 
KSCC algorithm is then applied with £ = 4 in order to sep- 
arate the curves. 

We have also tried to apply other competing algorithms 
to the data in Fig. 2. Those algorithms are divided into two 
categories. 

The first category is local algorithms (e.g., [20, 13, 9, 
8, 2]), i.e., algorithms that are based on local geometries. 
Due to the fact that most of the data sets (in Fig. 2) are 
sparsely sampled and consist of intersecting clusters, these 



Algorithm 1 Kernel Spectral Curvature Clustering (KSCC) 

Input: Data set X, kernel matrix K, maximal dimension £ 
(in feature space), number of manifolds K, and number 
of sampled (£ + 1) -tuples c (default = 100 • K) 

Output: K disjoint clusters Ci, . . . , Ck- 

Steps: 

l: Sample randomly c subsets of X (with indices 
Ji , . . . , J c ), each containing £ + 1 distinct points. 

2: For each sampled subset J r , compute the squared polar 
curvature of it and each of the remaining N — £ — 1 
points in X by Eq. (5). Sort increasingly these c • (N — 
£ — 1) squared curvatures into a vector c. 

3: for p = 1 to £ + 1 do 

• Use Eq. (6) together with a 2 = c(N ■ c/K p ) to 
compute the (N — £—l)-c affinities and estimate 
the weights W via Eq. (9). 

• Apply spectral clustering [17] to these weights and 
find a partition of the data X into K clusters (can 
follow the corresponding steps of the SCC algo- 
rithm [5]). 

end for 

Record the partition Ci, . . . , Ck that has the smallest 
total KLS error, i.e., ckls of Eq. (10), for the corre- 
sponding K i?-flats in the feature space. 
4: Sample c/K (£ + l)-tuples of points from each Cfc 
found above and repeat Steps 2 and 3 to find K newer 
clusters. Iterate until convergence to obtain a best seg- 
mentation. 





Figure 3. Demonstration of failure of local algorithms on data sets 
in Fig. 2. 



methods would surely fail. In addition, they are generally 
not suitable for segmenting manifolds using a pre-specified 
model. Fig. 3 shows the failure of one such algorithm, K- 
Manifolds [20], on the two most densely sampled data sets 
(Figs. 2(c) and 2(d)). We observed in experiments that the 
A' -Manifolds algorithm tends to find arbitrary smooth man- 
ifolds that are far from the underlying models. 

The second category is other hybrid linear modeling al- 
gorithms, such as A-Flats [12, 11, 3, 23], MoPPCA [21], 
GPCA [24, 16], LSA [25], and ALC [15]. They can be ap- 
plied to segment the manifolds in Fig. 2 in the same feature 





(a) GPCA (in feature space) 
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(b) LSA (in feature space) 

Figure 4. Demonstration of failure of other hybrid linear modeling 
algorithms when applied to the data of Fig. 2 in embedded spaces. 

spaces as those corresponding to KSCC, where the mani- 
folds are mapped to flats. However, since all these methods 
do not perform well on general affine subspaces, their per- 
formance on the data in Fig. 2 (in feature space) is expected 
to be very poor. 

We first applied GPCA and LSA to all data sets in Fig. 2 
(after being mapped to affine subspaces), and obtained that 
the segmentation errors were all around 50%. Fig. 4 shows 
their segmentation results on two data sets in Fig. 2. We also 
applied ALC, A-Flats and MoPPCA to all the data sets (in 
Fig. 2) in the feature spaces. We found that the number of 
clusters found by ALC is very sensitive to its tuning param- 
eter (e), and even when ALC found the correct number of 
clusters, the clusters were far from the truth. For both MoP- 
PCA and A-flats, we used ten restarts (but only recorded 
the best result), and still observed that the results were all 
very bad. For fair comparison, we also directly applied the 
SCC algorithm [5] in the same feature spaces and found that 
it succeeded on each data set in Fig. 2. 

3.2. Two-view motion segmentation 

In this section we compare the performance of the KSCC 
algorithm with one competing method on 13 real data se- 
quences that are studied in [18] (and references therein): 
(1) boxes; (2) carsnbus3; (3) deliveryvan; (4) desk; (5) 
lightbulb; (6) manycars; (7) man-in-offi.ee; (8) nrbooks3; 
(9) office; (10) parking-lot; (11) posters-checkerboard; (12) 
posters-keyboard; and (13) toys-on-table. Each sequence 
consists of two image frames of a 3-D dynamic scene taken 
by a perspective camera (see Fig. 5), and the task is to sep- 
arate the trajectories of some feature points (tracked on the 
moving objects) in the two camera views of the scene. This 




(b) office 

Figure 5. Two sample sequences. 



application lies in the field of structure from motion, which 
is one of the fundamental problems in computer vision. 

Given a point x G K 3 in space and its image correspon- 
dences (a; i , yi ) ' , (£2 , 2/2 ) ' <= in two views, one can form 
a joint image sample y = [x\, y\, X2, t/2, 1)' S It is 
shown (e.g., in [18]) that, under perspective camera pro- 
jection, all the joint image samples y corresponding to one 
motion live on a distinct quadratic manifold in M 5 . More 
precisely, for a 3-D rigid-body motion, there exists a sym- 
metric 5-by-5 matrix 
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such that 

y'-H-y = 0; (18) 

and for a 2-D planar motion, there exist three matrices 
Hi, H2, H3 of the same form as in Eq. (17), such that 

y'-H 4 -y = 0, i= 1,2,3. (19) 

This fact has been used by the Robust Algebraic Segmenta- 
tion (RAS) algorithm [18] for constructing the perspective 
Veronese map in order to segment the motions. 

To solve the two-view motion segmentation problem, we 
also apply the above result but will show that each motion 
uniquely determines a 7-flat (for 3-D rigid-body motion) 
or 5-flat (for 2-D planar motion) in the space M 9 and that 
KSCC can be applied to the original 4-D point correspon- 
dences (xi,yi,X2,y2) via a properly constructed kernel 
function. Indeed, if we define z := (xi, y±, 1) ® (X2, 3/2, 1), 



where ® denotes the Kronecker product, then Eq. (18) can 
be rewritten into a linear equation as follows: 

z • (2h u 2h 2 , 2/i 3 , 2/i 4 , 2h 5 , 2h 6 , 2h 7 , 2h 8 , h 9 )' = 0, (20) 

and Eq. (19) into three such linear equations. 

Therefore, if the 4-D point correspondences 
(xi,yi,X2,V2) are mapped to the 9-D feature vectors 
z, then one can segment the motions by clustering 7-flats 
and 5-flats in M 9 (the last coordinate of z is constant). We 
follow the same idea of SCC [5] to use only the maximal 
dimension when having mixed dimensions, which seems to 
be effective in many cases. Therefore, we apply the KSCC 
algorithm with 1 = 1, together with the following kernel 
function: 

k((xx,y 1 ,x 2 ,y 2 ), (u 1 ,v 1 ,u 2 ,v 2 )) 

= {(xi,yi, 1) ® (2:2,2/2, 1)) • ((ui,vi, 1) <g> (u 2 ,v 2 , 1))' 
= {xxux + yivi + 1) • (x 2 u 2 + y 2 v2 + !)■ (21) 

We use the outliers-free version of the 13 data sets 
from [18] in order to solely focus on the clustering aspect. 
We apply the KSCC algorithm (with the default c) to each 
sequence and record the misclassification rate (in percent- 
age) and the running time (in seconds). To mitigate the ef- 
fect of randomness due to initial sampling, we repeat this 
experiment 200 times and compute a mean error e mcan , a 
standard deviation e s td, as well as an average running time 
t. We have also applied the RAS algorithm [18] to these 
outlier-free data, and found that RAS relies on an angleTol- 
erance parameter (the output was quite sensitive to choices 
of this parameter). We tried a few different values and com- 
bined the results into a coherent clustering. All the experi- 
ments were performed on an IBM T43p laptop with a 2.13 
GHz Intel Pentium M processor and 1 Gb of RAM. The re- 
sults obtained by the two algorithms are summarized in Ta- 
ble 1 (note that the results of RAS are not reported in [18]). 

We observe that the KSCC and RAS algorithms (a) ob- 
tain the same classification error on sequences 1, 5, 9 (on se- 
quence 9 the difference is negligible: 0.05% • 259 = 0.13); 
(b) have significantly different misclassification rates (i.e., 
with a difference larger than 4%) on eight sequences (3, 4, 
6, 7, 8, 10, 11, 13), with each algorithm having a better per- 
formance on four of them (KSCC on sequences 4, 7, 8, 13, 
and RAS on sequences 3, 6, 10, 11); (c) have very small 
difference on sequences 2, 12 (about 1%, i.e., 3 points). In 
terms of running time, the KSCC algorithm is at least twice 
as fast as RAS on all sequences (except sequence 7), some- 
times even being five times faster (on sequence 8). In sum- 
mary, the KSCC and RAS algorithms have almost compara- 
ble performances in terms of segmentation errors; however, 
the KSCC algorithm is faster. 



Table 1 . The misclassification rates (in percentage) and the running 
times (in seconds) of the KSCC and RAS algorithms when applied 
to the 13 sequences. The second column presents the number of 
samples TV in each sequence. Due to randomness, the KSCC al- 
gorithm is applied 200 times to each sequence, and a mean e me an 
and a standard deviation e st d of the errors are computed. 



Seq. 


N 


KSCC 


RAS 


^mcan 


e s td 


t 


e 


t 


1 


236 


0.85% 


0.00% 


2.14 


0.85% 


6.68 


2 


219 


1.04% 


5.63% 


2.05 


0.00% 


6.68 


3 


254 


30.8% 


5.59% 


2.59 


15.4% 


6.84 


4 


155 


0.22% 


1.04% 


1.99 


5.16% 


4.50 


5 


205 


0.00% 


0.00% 


1.86 


0.00% 


5.29 


6 


144 


15.7% 


8.96% 


0.70 


0.00% 


3.91 


7 


73 


0.63% 


2.76% 


1.87 


15.1% 


1.17 


8 


388 


1.97% 


5.45% 


3.43 


11.1% 


20.2 


9 


259 


0.05% 


0.13% 


2.18 


0.00% 


6.68 


10 


136 


22.3% 


18.7% 


1.18 


0.00% 


2.89 


11 


280 


4.97% 


1.03% 


2.68 


0.00% 


10.5 


12 


297 


1.38% 


0.80% 


2.62 


0.34% 


9.49 


13 


91 


2.89% 


3.78% 


0.87 


18.7% 


1.84 



4. Discussions and future work 

We have combined the SCC algorithm [5] and kernels to 
suggest the KSCC algorithm (Algorithm 1) for segmenting 
parametric surfaces which can be mapped to flats in spaces 
of moderate dimensions. The computational task is per- 
formed solely in the original data space (using the kernel 
matrix), thus one would expect KSCC to be faster than per- 
forming SCC in the embedded spaces (when having large 
dimensions). We have exemplified its success on a few ar- 
tificial instances of multi-manifold modeling and on a real- 
world application of two-view motion segmentation under 
perspective camera projection. 

There are several important issues that need to be further 
explored in order to more broadly and successfully apply 
the KSCC algorithm. 

1) The choice of the kernel function might affect the suc- 
cess of the KSCC algorithm. We exemplify this on the 
three-sphere data in Fig. 2(c), where we used the spheri- 
cal kernel (see Eq. (12)) and practically segmented 3-flats 
in M 4 . We now apply KSCC to the same data with the fol- 
lowing two choices of kernels: the standard quadratic poly- 
nomial kernel 

fc 2s (x,y)Hx,y) + (x. 2 ,y. 2 ), (22) 

where . 2 means taking elementwise squares, and the full 
quadratic polynomial kernel &2f(v) of Eq. (13), respec- 
tively. This is equivalent to applying SCC to segment 5-flats 
in M. 6 and 8-flats in W 9 , respectively. The corresponding re- 
sults are shown in Fig. 6. 




Figure 6. Output of the KSCC algorithm when applied to the three 
spheres in Fig. 2(c) with the two kernels A;2 S and kzt, respectively. 

As it turned out, the KSCC algorithm failed with the full 
quadratic kernel k,2f. The reason for this is that as I in- 
creases, the segmentation task becomes more difficult for 
KSCC since the initial approximation of the weight matrix 
W (defined in Eq. (7)) would deteriorate. This experiment 
suggests that one should use the optimal kernel function 
with KSCC in the sense that it should minimize the intrin- 
sic dimension of the flats in the feature space. We are cur- 
rently developing an automatic scheme to choose the least 
number of terms that are necessary for linearization of the 
manifolds. 

2) The dimension of the flats in the feature space is often 
quite large, sometimes even with the optimal kernel func- 
tion (for example, £ — 25 in the problem of segmenting 
motions from three perspective camera views [10]). Due to 
the limitation of the KSCC algorithm in dealing with large 
I, a better initialization strategy needs to be explored in or- 
der to more robustly estimate the initial weights W. We 
plan to develop such a technique in later research and con- 
sequently apply the improved KSCC algorithm to solve the 
three- view motion segmentation problem [10]. 

3) We need to examine more carefully the situation when 
data is corrupted with noise. Though KSCC can handle 
small levels of noise, the clustering task becomes very chal- 
lenging for KSCC (and probably for any other manifold 
clustering algorithm) when the noise level increases in the 
original space. There are two reasons for this. First, in 
many cases the manifold structure obscures quickly when 
corrupted with noise (see e.g., Figs. 2(b) and 2(e)). Second, 
the noise level is further enlarged in feature space due to the 
embedding having higher-order terms. One has to develop 
theoretical guarantees for good performance of KSCC in the 
presence of noise (as done for SCC in [4]), and use related 
insights for improving the current algorithm. For example, 
one can note the effect of special geometric transformations 
of the data, under which the noise distortion (from origi- 
nal to feature space) is minimal, and apply them before the 
KSCC algorithm. 

4) We need to study the performance of KSCC on data 
contaminated with outliers. Solutions can follow the idea 
used in the SCC algorithm [5] and possibly combined with 
RANdom SAmple Consensus (RANSAC) [6, 22, 26], in a 



similar way as the RAS algorithm [18]. Future work will 
test the KSCC algorithm on the 13 data sets (in Table 1) in 
the presence of outliers. 
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